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54) [Title of the Invention] Voice Synthesizing Device 

57) [Abstract] 

[Purpose] 

The present invention relates to a voice synthesizing 
device, and particularly to a voice synthesizing device for 
synthesizing voices of fixed sentences used for voice services 
such as traffic information or general weather condition, and 
it is an object thereof to synthesize voices with natural meters 
that are easy to. hear. 
[Structure] 

In a voice synthesizing device for synthesizing sentences 
comprised of fixed form parts representing fixed information 
that are common to all of a group of messages to be synthesized 
and non-fixed form parts representing variable information that 
differ for each of the group of messages by smoothly combining 
synthesizing units such as syllables or phonemes, the voice 
synthesizing device is characterized by comprising, for 
generating FO patterns which are time-varying patterns of basic 
frequencies having minimum frequencies included in voices of 
voiced sounds , a first FO pattern generating means of generating 
FO patterns for the fixed form parts, a second FO pattern for 
generating FO patterns for the non-fixed parts, a means of 

1 



generating a FO pattern for a sentence upon sequentially 
connecting the FO patterns that have been generated by the 
respective generating means, and a means of synthesizing voice 
signals by using the FO pattern. 

[What is claimed is] 

[Claim 1] 

A voice synthesizing device for synthesizing a group of 
messages upon combining fixed information that are common to 
the group of messages to be synthesized and variable information 
that differ for each of the. group of messages, the voice 
synthesizing device comprising: 

a first generating means of generating time-varying 
patterns of basic frequencies for the fixed information , a second 
generating means of generating time-varying patterns of basic 
frequencies for the variable information, an editing means of 
generating a time-varying pattern of basic frequency for a 
sentence upon sequentially connecting the time-varying patterns 
of basic frequencies generatedby the respective generatingmeans , 
and a synthesizing means of synthesizing a voice signal by using 
the time-varying pattern of basic frequency generated by the 
editing means. 
[Claim 2] 

A voice synthesizing device , wherein the first generating 
means according to Claim 1 generates the time-varying patterns 
of basic frequencies by comprising a means of storing the 



time-varying patterns of basic frequencies for the fixed 
information extracted from natural voices upon employing a style 
of chronological orders of basic "frequencies and a means of 
reading a chronological order of basic frequency that suits a 
sentence to be inputted from the storing means . 
[Claim 3] 

A voice synthesizing device, wherein the first generating 
means according to Claim 1 generates the time-varying patterns 
of basic frequencies by comprising a means of storing the 
time-varying" patterns of basic frequencies for the- fixed 
information extracted from natural voices upon employing a style 
of parameters of approximate models of the time-varying patterns 
of basic frequencies, a means of reading a parameter that suits 
a sentence to be inputted from the storing means, and a means 
of generating chronological orders of basic frequencies by using 
the parameter. 
[Claim 4] 

Avoice synthesizing device, wherein the second generating 
means according to Claim 1 generates the time-varying patterns 
of basic frequencies by comprising a means of storing the 
time-varying pattern of basic extracted from natural voices for 
a combination of number of syllables and type of accent for the 
variable information upon employing a style of chronological 
orders of basic frequencies , and a means of selecting and reading 
a chronological order of basic frequency that suits a sentence 



to be inputted from the storing means. 
[Claim .5] 

A voice synthesizing device, wherein the second generating 
means according to Claim 1 generates the time-varying patterns 
of basic frequencies by comprising a means of storing the 
time-varying patterns of basic frequencies extracted from 
natural voices for all combinations of number of syllables and 
type of accent for the variable information upon employing a 
style of parameters of approximate models of the time-varying 
patterns of basic frequencies , " a" means of selecting to read a 
parameter suitable to be inputted from the storing means, and 
a means of generating chronological orders of the basic frequency 
by using the parameter. 
[Claim 6] 

A voice synthesizing means device, wherein the second 
generating means according to Claim 1 includes a means of 
generating the time-varying patterns of basic frequencies for 
the variable information according to rules. 
[Claim 7] 

A voice synthesizing device for generating a duration time 
length which is a line of respective time lengths of synthesizing 
units, the voice synthesizing device comprising a first 
generating means of generating duration time lengths for fixed 
information, a second generating means of generating duration 
time lengths for variable information, an editing means of 



generating a duration time length for a sentence upon 
sequentially connecting the duration time lengths generated by 
the respective generating means, and a means of synthesizing 
a voice signal by using the duration time length. 
[Claim 8] 

A voice synthesizing device, wherein the first generating 
means according to Claim 7 generates the duration time lengths 
by comprising a means of storing the duration time lengths for " 
the fixed information extracted from natural voice and a means 
of reading a duration time length that" suits a sentence to be 
inputted from the storing means. 
[Claim 9] 

Avoice synthesizing device, wherein the second generating 
means according to Claim 7 includes a generating means of 
generating duration time lengths for variable information. 
[Claim 10] 

The voice synthesizing device according to Claim 1 or 7 , 
wherein the voice synthesizing device comprises a text inputting 
means of enabling separation of fixed information and variable 
information wherein the voice synthesizingdevice presents fixed 
information while a synthesized sentence is inputted by a user 
with an user interface for inputting and editing variable 
information . 
[Claim 11] 

The voice synthesizing device according to Claim 1 or 7 , 



wherein thevoice synthesizing device comprises a selectingmeans 
in which fixed information as well as input candidates for 
variable information are presented for designating variable 
information from among these candidates and a text inputting 
means of enabling separation of fixed information and variable 
information. 

[Detailed Explanation of the Invention] 
[0001] 

[Industrially Applicable Field] 

The present invention relates to a voice synthesizing 
device, and particularly to a voice synthesizing device for 
synthesizing voices used, for instance, for voice services such 
as traffic information or general weather condition, the voices 
being comprised of fixed information that are common to all groups 
of messages to be synthesized (hereinafter referred to as "f ixed 
form parts") and variable information that are not common to 
the groups of messages (hereinafter referred to as "non-fixed 

form parts") . 

[0002] 

Demands for reduction of labor andmechanization in general 
public became increasingly higher in these years. The field 
of various voice services is not an exception, and voice 
synthesizing devices are currently being used for voice services 
such as traffic information or general weather conditions or 
payment reference services in banks . It is therefore required 



that such voice synthesizing devices provide synthesized voices 
with natural meters that are easy to hear. 

[0003] 

[Prior Art] 

A conventional voice synthesizing device employs , for the 
fixed form part, a recording-editing method in which 
preliminarily recorded voices are reproduced or an 
analyzing-synthesizing method in which such voices converted 
into some kind of voice parameters are accumulated for 
synthesizing voices by using these parameters . For the 
non-fixed form parts represented by proper nouns or numerals, 
a ruled-synthesizing method was generally employed for 
synthesizing voices according to rules from character strings 
for connecting or switching voices that have been synthesized 
through the respective methods for output. 
[0004] 

A structural view of the voice synthesizing device 
according to the prior art is illustrated in Fig. 9. In the 
drawing, 1 denotes a text inputting means , 2 is a text analyzing 
means , 3 is a fixed form part synthesizing means , 4 is a non-fixed 
form part synthesizing means, 5 is an output voice connecting 
means, and 6 is a voice outputting means , respectively. A text 
inputted into the text inputting means 1 is analyzed in the text 
analyzing means 2 while referring to a word dictionary. Parts 
of the fixed form parts are consequently inputted into the fixed 



form part synthesizing means 3 and voices are synthesized using 
accumulated voice data for the fixed form parts . Parts comprised 
of variable information are inputted into the non-fixed form 
part synthesizing means 4 to perform ruled-synthesizing from 
character strings. Voices that have been synthesized in the 
respective synthesizing means are connected in the output voice 
connecting means 5 such that these voices are connected as a 
sentence and are outputted through the voice outputting means 
6 . 

[0005] 

[Subject the Invention is to Solve] 

However, in considering quality of voices, the quality 
of voices generated by the ru led-synthesizingmethod is inferior 
to those generated by the recording-editing method or the 
analyzing-synthesizing method in the present state. 
[0006] 

Thus, drawbacks were presented in that gaps in qualities 
of fixed form parts and non-fixed form parts were found in voices 
obtained by connecting fixed form parts obtained using the 
recording-editing method or the analyzing-synthesizing method 
with non-fixed form parts obtained using the ruled-synthesizing 
method and in that the non-fixed form parts including the 
important information within the sentences were difficult to 
be caught. On the contrary, it is easier to catch voices in 
which the whole sentence is generated to be of identical quality , 
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and thanks to improvements in qualities of voices of 
ruled-synthesizing method in these years accompanying technical 
improvements, sentences entirely synthesized through 
ruled-synthesizing method have become also sufficiently 
acceptable for actual use. Employing the ruled-synthesizing 
method for all , it will of course relieve the bother of rerecording 
voices in case the fixed form parts are to be changed. 
[0007] 

In case voices are to be synthesized from sentences in 
which Chinese characters and kana (Japanese syllabary) are mixed 
as we usually use in daily lif e , it is required to generate natural 
meters (intonation, accent, pause, etc.) while referring to the 
dictionary and rules when using the ruled-synthesizing method 
unlike the recording-editing method or the 

analyzing-synthesizing method. The following two problems are 

found in such a process. 

[0008] 

The first problem is found in a process of generating a 
phonetic character string by analyzing a sentence made up by 
mixing Chinese characters and kana. In. this context, the term 
"phonetic character string" indicates a character string 
including notations indicative of positions of pauses or 
positions of accents in a phonemic string (substantially equal 
to^Romaji" (Latin alphabet) notation in the Japanese language) 
or in a syllabic string (substantially equal to kana-letter 



notation in the Japanese language) .' Since the Japanese language 
is not written as being separated by each word and a single Chinese 
character may be read in many ways, erroneous reading, errors 
in accents or insertion of unnatural pauses may be happened 
frequently when trying to generate phonetic character strings 
while referring to dictionaries and rules. Although the first 
problem is solved by performing ruled-synthesizing upon 
character strings extracted from input string files for voice 
conversion serving as a storing means storing therein 
preliminarily generated input character strings including meter 

information (reference should be made to Japanese Patent 
Unexamined Publication No. 4-107598) , it is required to reduce 

structural costs. 

[0009] 

The second problem occurs in a process of generating 
acoustic (physical) parameters from phonetic character strings . 
For instance, intonations, which are variations in heights of 
voices , are generally controlled by using time-varying patterns 
of basic frequencies having minimum frequencies included in 
voices of voiced sounds (hereinafter referred to as "F0 
patterns") . This may be represented as chronological orders 
of basic frequencies of every several milliseconds (msec) . 
Although Fujisaki models or dot-pitch models are well-known rules 

for generating such F0 patterns from the above phonetic character 
string, it is difficult to obtain such F0 patterns, which 
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delicately change depending on complicated human mechanisms of 
voice production, contents or meanings, on the basis of simple 
rules. Time lengths of respective phonemes or syllables are 
further set to suitable values such that generated voices will 
not clog or elongated but be natural - However , such time lengths 
cannot be ambiguously defined depending on types of phonemes 
or syllables but are complexly affected through positions of 
the phonemes or syllables within a sentence or through peripheral 
phonemic circumstances, and thus can not be obtained through 
simple rules, either. 
[0010] 

[Means of Solving the Subject] 

Fig. 2 is a conceptual diagram of the present invention. 
Hereinafter, explanations will be made based on an exemplary 
sentence saying "TONIGHT'S WEATHER OF [TOKYO] DISTRICT WILL BE 
[FINE]". 
[0011] 

The sentence is comprised of a fixed form part "TONIGHT'S 
WEATHER OF DISTRICT . . . WILL BE ..." and non-fixed form parts 
[TOKYO] and [FINE] , wherein the non-fixed form parts may be 
respectively replaced with words such as [KANAGAWA] and [RAINY] . 
As for the fixed form part in synthesizing such a sentence, a 
F0 pattern and a duration time length for the fixed form part 
are extracted on the basis of a voice as uttered by a human reading 
the same sentence, which are stored as chronological orders of 
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basic frequency values of every several msec in case it is a 
FO pattern or as a line of lengths of respective phonemes in 
case it is a duration time length. As for the non-fixed form 
part , FO patterns for all combinations for the number of syllables 
of words or phrases that are expected to be inputted into the 
non-fixed form part as well as type of accent are stored, and 
a FO pattern of a combination of the same number of syllables 
and type of accent is read on the basis of the inputted sentence 
or phonetic character strings obtained by analyzing the same. 
Since such FO patterns are determined- not only on the basis of 
the number of syllables and type of accent but also in view of 
a FO pattern of the whole sentence, the FO patterns will be 
respectively different and needs to be selected depending on 
the position within the fixed form part into which the same is 
inserted. For instance, the word "TOKYO" is of 4 molar 0 type, 
an FO pattern of 4 molar 0 type is selected from among patterns 
to be inserted into the position of the fixed form part 
-TONIGHT'S ... DISTRICT". The duration time lengths for the 
non-fixed form parts are generated according to rules. By 
sequentially connecting FO patterns and duration time lengths 
as retrieved (or generated) separately for the fixed form part 
and non-fixed form parts, a FO pattern for the entire sentence 
is generated. The FO patterns are connected in continuation 
within the entire sentences. 
[0012] 
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Upon generating non-fixed form parts according to rules 
without storing FO patterns thereof , it will be possible to obtain 
voices of even higher quality than that in cases all FO patterns 
of entire sentences are generated according to rules. 

[0013] 

[Action] 

A principle view of the present invention is shown in Fig. 
1. In the drawing, 1 denotes a text input means, 7 denotes a 
text analyzing means, 8 denotes a means of generating FO 
patterns/duration time lengths for the "fixed form parts, 9 
denotes a means of generating FO patterns/duration time lengths 
for the non-fixed form parts, 10 denotes a means of connecting 
and editing FO patterns/duration time lengths, 11 denotes an 
acoustic parameter generating means, 12 denotes a voice signal 
generating means, and 6 denotes a voice outputting means, 
respectively. A text to be synthesized is inputted into the 
text inputting means 1. The text analyzing means 7 separates 
the input text into non-fixed form parts and fixed form parts. 
While text analysis is necessary used for ruled-synthesizing 
of an arbitrary sentence for the separation into the non-fixed 
form parts and the fixed form parts in case the inputted text 
is an ordinary sentence in which Chinese characters and kana 
are mixed, if it is possible to separately input the non-fixed 
form parts and the fixed form parts through the user interface, 
the fixed form parts and the non-fixed form parts shall be simply 

13 



inputted into the respective means of generating FO 
patterns/duration time lengths. The text analyzing means 7 
further generates phonetic character strings (phonemic strings 
or syllabic strings) based on the inputted sentence to output 
to the acoustic parameter generating means 11 . FO patterns and 
duration time lengths are respectively generated for the fixed 
form parts and non-fixed form parts in the means of generating 
FO patterns/duration time lengths for the fixed form parts 8 
for the fixed form parts and in the means of generating FO 
patterns /duration time lengths for the non-fixed form parts 9 
for the non-fixed form parts. These FO patterns and duration 
time lengths are sequentially connected in the means of 
connecting and editing FO patterns/duration time lengths 10 to 
generate a FO pattern and duration time length for the entire 
sentence . The acoustic parameter generating means 11 generates 
acoustic parameters such as formants on the basis of phonetic 
character strings such as phonemic strings or syllabic strings. 
The acousticparameters are determinedby the synthesizingmethod 
employed in the voice signal generating means 12. The 
synthesizing method may be a waveform editing method in which 
waveforms are directly edited, wherein wavelength connecting 
information are generated as equivalents instead of acoustic 
parameters in such a method, and such information are regarded 
herein to be included in acoustic parameters . The voice signal 
generating means 12 generates voice signals from the FO patterns , 
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duration time lengths , and acoustic parameters to be outputted 

from the voice outputting means 6. 

[0014] 

[ Embodiments ] 

It is considered that there are three levels for the F0 
pattern generating method. The first level is a method in which 
F0 patterns extracted from natural voices are accumulated in 
a style of chronological orders of basic frequencies as they 
are which are then read at the time of synthesizing, and is a 
method with which it is expected to synthesize the mos fnatural- - 
voices. The second level is a method in which F0 patterns of 
natural- voices are approximated to models such that parameters 
of such models are accumulated, wherein these parameters are 
converted into the style of chronological orders of basic 
frequencies at the time of synthesizing. The third level is 
a method in which parameters of models are generated regularly 
on the basis of text analyzing results for generating 
chronological orders of basic frequencies from these parameters . 
[0015] 

It is further considered that there are two levels for 
the method for generating duration time lengths . The first level 
is a method in which duration time lengths extracted from natural 
voices are accumulated in lines of time lengths that are 
maintained as they are which are then read at the time of 
synthesizing. The second level is a method in which the time 
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lengths are generated regularly on the basis of text analyzing 
results. Various combinations of the above levels may be 
considered as methods for generating FO patterns and duration 
time lengths for the non-fixed form parts and the fixed form 
parts. These will be described below as embodiments. 
[0016] 

A structural view of a first embodiment of the present 
invention is illustrated in Fig. 3. This embodiment 
corresponds to Claims 2 , 4 , 8 and 9 . In the drawing. Oil denotes 
a text inputting unit , 7 1 denotes a text analyzing ur.i t , 7 2 denotes 
a fixed" form/non-fixed form determining unit, 73 denotes an 
output switching unit, 7 4 denotes a word dictionary, 7 5 denotes 
a unit for accumulating examples of sentences for the fixed form 
parts, 81 denotes a unit for reading duration time lengths for 
the fixed form parts, 82 denotes a unit for reading FO patterns 
for the fixed form parts, 83 denotes a unit for accumulating 
duration time lengths for the fixed form parts, 84 denotes a 
unit for accumulating FO patterns for the fixed form parts, 91 
denotes a unit for generating duration time lengths for the 
non-fixed form parts , 92 denotes a unit for reading FO patterns 
for the non-fixed form parts, 93 denotes an accent dictionary, 
94 denotes a unit for accumulating FO patterns for the non-fixed 
form parts, 101 denotes a unit for connecting and editing duration 
time lengths, 102 denotes a unit for connecting and editing FO 
patterns, 111 denotes an acoustic parameter generating unit, 



16 



112 denotes an acoustic parameter accumulating unit , 121 denotes 
a voice signal generating unit and 61 denotes a voice outputting 

unit . 
[0017] 

FO patterns for fixed form parts preliminarily extracted 
from natural voices for the fixed form parts are stored in the 
unit for accumulating FO patterns for the fixed form parts 84, 
whereas for the non-fixed form parts, all combinations of the 
number of syllables and type of accent of FO patterns for non-fixed 
form parts are stored in the unit for accumulating FO patterns 
for the non-fixed form parts 94, and duration time lengths for 
fixed form parts extracted from natural voices are further stored 
in the unit for accumulating duration time lengths for the fixed 
form parts 83 for the fixed form parts . A text to be synthesized 
is inputted into the text inputting unit Oil . In case a phonetic 
expression with Chinese characters and kana being mixed is 
inputted, the text is analyzed in the text analyzing part 71 
while referring to the word dictionary 74. In the fixed 
form/non-fixed form determining unit 72, reference is made to 
examples of fixed form sentences stored in the unit for 
accumulating examples of sentences for the fixed form parts 7 5 
so as to separate the result upon analysis into fixed form parts 
and non-fixed form parts . The output switching unit 73 outputs 
the fixed form parts and the non-fixed form parts to respective 
units for generating duration time lengths and FO patterns . At 
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the same time, a phonetic character string of the input text 
(such as phonemic string or syllabic string) is outputted to 

the acoustic parameter generating part 111 as a result of 

analyzing the text. 
[0018] 

As for the fixed form parts, the unit for reading duration 
time lengths for the fixed form parts 81 reads duration time 
lengths' from the unit for accumulating duration time lengths 
for the fixed f ormparts 83 , while the unit for reading F0 patterns 
for the fixed form parts 82 reads F0 patterns from the unit for - 
accumulating F0 patterns for the fixed form parts 84. They are 
respectively outputted to the unit for connecting and editing 
F0 patterns 102 upon passing the unit for connecting and editing 
duration time lengths 101. As for the non-fixed form parts, 
the unit for generating duration time lengths for the non-fixed 
f ormparts 91 generates duration time lengths according to rules . 
Generation of duration time lengths according to rules is 
generally performed by using a method in which a time length 
table is retrieved for each of the phonemes or syllables of the 
non-fixed form parts and are then corrected depending on phonemic 
circumstances or the like . Next , the unit for reading F0 patterns 
for the non-fixed form parts 92 acquires accents of the words 
of the non-fixed form parts from the accent dictionary 93 , refers 
to the unit for accumulating F0 patterns for the non-fixed form 
parts 94 on the basis of the number of syllables and type of 
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accent and outputs the read FO patterns to the unit for connecting 
and editing duration time lengths 101 and the unit for connecting 
and editing FO patterns 102 . The unit for connecting and editing 
duration time lengths 101 sequentially connects respective 
phonemic time lengths for the fixed form parts and non-fixed 
form parts so as to form a line for the duration time length 
of the entire sentence. The unit for connecting and editing 
FO patterns 102 sequentially connects respective FO patterns 
for the fixed form parts and non-fixed form parts so as to form 
a FO pattern for the entire sentence . Since FO patterns continue " 
during utterance, in case non-succeeding portions shall be 
present in the respective FO patterns as read for both, the fixed 
form parts and non-fixed form parts, editing such as suitable 
smoothing needs to be performed. 
[0019] 

On the other hand, the acoustic parameter generating part 

111 generates acoustic parameters based on input phonetic 
character strings. The acoustic parameter accumulating part 

112 stores acoustic parameters therein. The term "acoustic 
parameters" as used herein indicates voice data expressed by 
numeric values by using voice generating models in order to 
compress data capacity, and various types such as f ormants , 
PARCOR or LSP are known. Synthesizing methods using such 
acoustic parameters are respectively referred to as formant 
synthesis, PARCOR synthesis or LSP synthesis and are realized 
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by the voice signal generating unit 121. In addition, Another 
synthesizing method is a waveforms editing method in which 
waveforms are directly edited, though waveform connecting 
information are generated as equivalents instead of acoustic 
parameters in such a method , such information are regarded herein 
to be included in acoustic parameters . Acoustic parameters are 
accumulatedby phonetic characters or such more diversif iedunits 
depending on front and rear phonemic circumstances . By reading 
and connecting these in accordance with phonetic character 
strings , acoustic parameter strings of the synthesized sentence 
can be generated. The voice signal generating unit 121 generates 
voice signals using the duration time lengths, FO patterns and 
acoustic parameter strings for the synthesized sentence 
generated above . The voice outputting unit 61 outputs the voice 

signal as a synthesized voice upon performing DA conversion. 

[0020] 

A structural view of the second embodiment of the present 
invention is illustrated in Fig . 4 . This embodiment corresponds 
to Claims 3 and 5 . The present embodiment is arranged in that 
the unit for reading FO patterns for the fixed form parts 82 
and the unit for accumulating FO patterns for the fixed form 
parts 84 are substituted by a unit for reading FO parameters 
for the fixed form parts 85 and a unit for generating FO patterns 
for the fixed form parts 86, and a unit for accumulating FO 
parameters for the fixed form parts 87 , while the unit for reading 
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FO patterns for the non-fixed form parts 92 and the unit for 
accumulating FO patterns for the non-fixed form parts 94 are 
substituted by a unit for reading FO parameters for the non-fixed 
form parts 95 and a unit for generating FO patterns for the 
non-fixed f ormparts 96 , and a unit for accumulating FO parameters 
for the non-fixed form parts 97 . 
[0021] 

In this embodiment, FO patterns that have been extracted 
preliminarily from natural voices are approximated through 
models so that the parameters may be ' accumulated in the unit 
for accumulating FO parameters for the fixed form parts 87 and 
the unit for accumulating FO parameters for the non-fixed form 
parts 97. In synthesizing voices, for the fixed form parts, 
the unit for reading FO parameters for the fixed form parts 85 
reads FO patterns for the fixed form parts from the unit for 
accumulating FO parameters for the fixed form parts 87 , and the 
unit for generating FO patterns for the fixed form parts 86 
generates chronological orders of basic frequencies (FO 
patterns) from the parameters. Similarly, for the non-fixed 
form parts, the unit for reading FO parameters for the non-fixed 
form parts 95 acquires accents of words of the non-fixed form 
parts from the accent dictionary 93 and reads proper FO parameters 
from the unit for accumulating FO parameters for the non-fixed 
form parts 97 depending on the number of syllables and type of 
accent, and the unit for generating FO patterns for the non-fixed 
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f ormparts 96 generates chronological orders of basic frequencies 

(FO patterns) from these. 

[0022] 

A structural view of the third embodiment of the present 
invention is illustrated in Fig . 5 . This embodiment corresponds 
to Claim 6 . The present embodiment is arranged in that the unit 
for reading FO patterns for the non-fixed form parts 92 and the - 
unit for accumulating FO patterns for the non-fixed form parts 
94 are substituted by a unit for generating FO patterns for the 
non-fixed form parts 9 8." "Since processes of the remaining units 
are identical to those of the first embodiment, only the unit 
for generating FO patterns for the non-fixed form parts 98 will 
be explained here. 
[0023] 

The unit for generating FO patterns for the non-fixed form 
parts 98 acquires accents of words of the non-fixed form parts 
from the accent dictionary 93 to generate FO patterns according 
to rules with regard to their positions within a sentence. As 
a method for generating FO patterns according to rules , a method 
employing models such as Fujisaki models or dot-pitch models 
is commonly used and these are applicable also in the present 
case . 
[0024] . 

A structural view of the fourth embodiment of the present 
invention is illustrated in Fig . 6 . This embodiment corresponds 
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to Claims 10 and 11 . The present embodiment is arranged in that 
more accurate text analysis is enabled by substituting the user 
interface of the text inputting unit of the first embodiment. 
An input interface unit 012 reads fixed form parts from a unit 
for accumulating examples of sentences for the fixed form parts 
013 and displays these as illustrated in Fig. 7 or 8 as a user 
interface. ' In Fig. 7 , for the fixed form parts are provided 
with columns having display functions only while the non-fixed 
form parts are provided with columns having editing functions 
capable of inputting/editing words freely" and makes a user to 
input non-fixed form parts. By performing input using such an 
interface, it will not be required for performing determination 
of fixed form parts and non-fixed form parts and text analysis 
may be performed by retrieving only fixed form parts using the 
word dictionary 74. 
[0025] 

In Fig. 8 , the device includes an interface in which input 
candidates for the non-fixed form parts are accumulated in the 
unit for accumulating examples of sentences for the fixed form 
parts 13 , and upon designating a column of a non-fixed form part, 
input candidates to be inputted into this spot are displayed 
so that' it is possible to designate which of them is inputted 
using a candidate selecting means. Similarly, determination 
of fixed form parts and non-fixed form parts is not necessary 
and text analysis may be performed by retrieving only fixed form 
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parts using the word dictionary 74. Sequent processes are 

identical to those of the other embodiment. 

[0026] 

[Effect of the Invention] 

As explained above, according to the present invention, 
it is possible to synthesize voices with natural meters that 
are easy to hear in a voice synthesizing device for synthesizing 
voices of fixed sentences which is particularly used for voice 
services such as traffic information or general weather 
"condition . 
[Brief Explanation of the Drawings] 

[Fig. 1] 

It shows a principle view of the present invention. 
[Fig. 2] 

It shows a conceptual view illustrating basic ideas of 
the present invention. 
[Fig. 3] 

It shows a first embodiment of the present invention. 
[Fig. 4] 

It shows a second embodiment of the present invention. 
[Fig. 5] 

It shows a third embodiment of the present invention. 
[Fig. 6] 

It shows a fourth embodiment of the present invention. 
[Fig. 7] 
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It shows a first example of a user interface of the present 
invention . 
[Fig. 8] 

It shows a second example of a user interface of the present 
invention . 
[Fig. 9] 

It shows a prior art. 
[Explanations of Reference Numerals] 
1 Text inputting means 

2, 7 Text analyzing means 

3 Fixed form part synthesizing means 

4 Non-fixed form part synthesizing means 

5 Output voice connecting means 

6 Voice outputting means 

8 Means of generating FO patterns/duration time lengths for 
the fixed form parts 

9 Means of generating FO patterns/duration time lengths for 
the non-fixed form parts 

10 Means of connecting and editing FO patterns/duration time 
lengths (abbreviated as editing means) 

11 Acoustic parameter generating means 

12 Voice signal generating means 
61 Voice outputting unit 

71, 71' Text analyzing unit 

72 Fixed form/non-fixed form determining unit 
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73 Output switching unit 

74 Word dictionary 

75 , 013 Unit for accumulating examples of sentences for the 
fixed form parts 

81 Unit for reading duration time lengths for the fixed form 
parts 

82 Unit for reading F0 patterns for the fixed form parts 

83 Unit for accumulating duration time lengths for the fixed 
form parts 

84 Unit for accumulating F0 patterns for the fixed" form parts 

85 Unit for reading F0 parameters for the fixed form parts 

86 Unit for generating F0 patterns for the fixed form parts 

87 Unit for accumulating F0 parameters for the fixed form 
parts 

9 1 Unit for generating duration time lengths for the non-fixed 
form parts 

92 Unit for reading F0 patterns for the non-fixed form parts 

93 Accent dictionary 

94 Unit for accumulating F0 patterns for the non-fixed form 
parts 

95 Unit for reading F0 parameters for the non-fixed form parts 
96, 98 Unit for generating F0 patterns for the non-fixed 
form parts 

97 Unit for accumulating F0 parameters for the non-fixed form 
parts 
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011 Text inputting unit 

012 Input interface unit 

101 Unit for connecting and editing duration time lengths 

102 Unit for connecting and editing F0 patterns 

111 Acoustic parameter generating unit 

112 Acoustic parameter accumulating unit 
121 Voice signal generating unit 
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Translation of drawings 



[FIG. 



1] 



(1) 



Principle view of the present invention 



(2) 



Text inputting means 



(3) 



Text analyzing means 



(4) Means of generating FO patterns/duration time lengths for 
the fixed form parts 

(5) Means of generating FO patterns /duration time lengths for 
the non-fixed form parts 

(6) Means of connecting and editing FO patterns/duration time 
lengths 

(7) Acoustic parameter generating means 

(8) Voice signal generating means 

(9) Voice outputting means 
[FIG . 2] 

(10) Conceptual view illustrating basic ideas of the present 
invention 

(11) Frequency 

(12) Fixed form parts 

(13) TONIGHT'S 

(14) WEATHER OF . . . DISTRICT 

(15) WILL BE 

(16) Time 

(17) Non-fixed form parts 

(18) 4 molar 0 type pattern 
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(19) 2 molar 1 type pattern 

(20) TOKYO 

(21) KANAGAWA PREFECTURE 

(22) FINE 

(23) RAINY 
[FIG. 3] 

(1) First embodiment of the present invention 

(2) . Text inputting unit 

(3) Text analyzing unit 

(4) Fixed "form/non-fixed form 'determining unit 

(5) Output switching unit 

(6) Word dictionary 

(7) Unit for accumulating examples of sentences for the fixed 
form parts 

(8) Unit for accumulating duration time lengths for the fixed 
form parts 

(9) Unit for accumulating F0 patterns for the fixed form parts 

(10) Unit for reading duration time lengths for the fixed form 
parts 

(11) Unit for reading F0 patterns for the fixed form parts 

( 12 ) Unit for generating duration time lengths for the non-fixed 
form parts 

(13) Unit for reading F0 patterns for the non-fixed form parts 

(14) Accent dictionary 

(15) Unit for accumulating F0 patterns for the non-fixed form 
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parts 

(16) Unit for connecting and editing duration time lengths 

(17) Unit for connecting and editing FO patterns 

(18) Acoustic parameter generating unit 

(19) Acoustic parameter accumulating unit 

(20) Voice signal generating unit 

(21) Voice outputting unit 
[FIG. 4] 

(1) Second Embodiment of the present invention 

(2) From output switching unit 73 - 

(3) Unit for accumulating duration time lengths for the fixed 
form parts 

(4) Unit for accumulating FO parameters for the fixed form 
parts 

(5) Unit for reading duration time lengths for the fixed form 
parts 

(6) Unit for reading FO parameters for the fixed form parts 

(7) Unit for generating FO patterns for the fixed form parts 

(8) Unit for generating duration time lengths for the non-fixed 
form parts 

(9) Unit for reading FO parameters for the non- fixed form parts 

(10) Unit for generating FO patterns for the non-fixed form 
parts 

(11) Accent dictionary 

(12) Unit for accumulating FO parameters for the non-fixed form 
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parts 

(13) To unit for connecting and editing duration time lengths 
101 

[FIG. 5] 

(14) Third Embodiment of the present invention 

(15) From output switching unit 73 

(16) Unit for generating duration time lengths for the non-fixed 
form parts 

(17) Unit for generating F0 patterns for the non-fixed form 
parts 

(18) Accent dictionary 

(19) To unit for connecting and editing duration time lengths 
101 

[FIG. 6] 

(20) Fourth embodiment of the present invention 

(21) Input interface unit 

(22) Unit for accumulating examples of sentences for the fixed 
form parts 

(23) Text analyzing unit 

(24) Output switching unit 

(25) Word dictionary 

(26) To unit for reading duration time lengths for the fixed 
form parts 81 

(27) To unit for reading duration time lengths for the non-fixed 
form parts 91 
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(28) To acoustic parameter generating unit 111 
[FIG. 7] 

(29) First example of a user interface of the present invention 

(30) Non-fixed form parts (editable part) 

(31) Fixed form parts 

(32) TONIGHT'S WEATHER OF [TOKYO] DISTRICT WILL BE [FINE]. 
[FIG. 9] 

(33) Prior Art 

(34) Text inputting means 

(35) Text analyzing means 

(36) Fixed form part synthesizing means 

(37) Non-fixed form part synthesizing means 

(38) Output voice connecting means 

(39) Voice outputting means 
[FIG. 8] 

(40) Second example of user interface of the present invention 

(41) TONIGHT'S WEATHER OF [...] DISTRICT WILL BE [FINE] 

(42) TOKYO /KANAGAWA/ SAI TAMA/ SOUTHERN CH I BA/ NORTHERN 
CHIBA/ SOUTHERN I BARAG I /NORTHERN IBARAG I / SOUTHERN 
TOCHIG I /NORTHERN TOCHIGI /OTHERS 
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